Variable Importance using Decision Trees, Supplementary Material for NIPS 2017 paper
نویسندگان
چکیده
We present a complete presentation of the theoretical results presented in the main text. We provide detailed analysis of the DStump algorithm in the context of a general additive regression model with uncorrelated design. We derive the results for the linear case as special case of the general theory. Our analysis is high-dimensional and non-asymptotic, and to our knowledge the first such analysis of feature selection properties of decision trees. We show that even in the high-dimensional setting where the number of features p is much larger that the sample size n, feature importance scores based on impurity reduction contain enough information for consistent model selection. Additionally, we provide simulation experiments to supplement the results in the main text.
منابع مشابه
Variable Importance Using Decision Trees
Decision trees and random forests are well established models that not only offer good predictive performance, but also provide rich feature importance information. While practitioners often employ variable importance methods that rely on this impurity-based information, these methods remain poorly characterized from a theoretical perspective. We provide novel insights into the performance of t...
متن کاملSupplementary Material to “ Cooled and Relaxed Survey Propagation for MRFs ”
This is the supplementary material for the submission to NIPS 2007, entitled “Cooled and Relaxed Survey Propagation for MRFs”. The purpose of this material is to prove the update equations of Relaxed Survey Propagation (RSP) in the main paper.
متن کاملSupplementary Material to “ Cooled and Relaxed Survey Propagation for MRFs ” Hai
This is the supplementary material for the submission to NIPS 2007, entitled “Cooled and Relaxed Survey Propagation for MRFs”. The purpose of this material is to prove the update equations of Relaxed Survey Propagation (RSP) in the main paper.
متن کاملTurning an Urban Scene Video into a Cinemagraph Supplementary Material
We have obtained the training data as follows. First, we have downloaded stationary video footages from YouTube and manually annotated 2D display regions at pixel levels. Second, we have run the same video segmentation algorithm and generate segments. Lastly, we label each segment as a positive (resp. negative) sample if more (resp. less) than 80% the segment overlaps with the annotated display...
متن کاملBasis-Function Trees as a Generalization of Local Variable Selection Methods
Local variable selection has proven to be a powerful technique for approximating functions in high-dimensional spaces. It is used in several statistical methods, including CART, ID3, C4, MARS, and others (see the bibliography for references to these algorithms). In this paper I present a tree-structured network which is a generalization of these techniques. The network provides a framework for ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2017